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Validation  of  the  FAA  Air  Traffic  Control  Specialist 

Pre-Training  Screen 


The  United  States  Federal  Aviation  Administration 
(FAA)  is  charged  with  managing  the  U.S.  airspace.  Air 
traffic  controllers  are  at  the  heart  of  a  weh  of  radars, 
computers,  and  communication  facilities  that  ensure 
the  safety  and  efficiency  of  an  increasingly  complex  air 
transportation  system.  Appropriate  selection  of  per¬ 
sonnel  into  a  training  program  for  these  critical  posi¬ 
tions  is  an  important  human  factors  problem.  This 
paper  describes  research  conducted  by  the  FAA  to 
validate  a  cost-effective  air  traffic  control  specialist 
(ATCS)  selection  procedure.  The  project  resulted  in 
the  implementation  of  a  new  selection  test  for  ATCS 
applicants  in  June  1992  that  was  radically  different 
from  any  previous  ATCS  selection  program  under¬ 
taken  in  the  U.S. 

Project  background 

The  ATCS  selection  process  between  fiscal  years 
1986  and  1992  consisted  of  two  major  tests:  (a)  a  4 
hour  written  aptitude  examination  administered  by 
the  United  States  Office  of  Personnel  Management 
(OPM);  and  (b)  a  9-week  initial  training  program 
administered  by  the  FAA  Academy.  Between  1984  and 
1992,  over  200,000  applicants  took  the  written  OPM 
aptitude  examination  across  the  country  at  a  cost  of 
about  $20  per  examinee  (J.  Aul,  personal  communica¬ 
tion).  Between  October  1985  and  January  1992,  just 
12,869  of  those  200,000+  applicants  were  selected  to 
attend  the  FAA  Academy  ATCS  Nonradar  Screen 
(“ATCS  Screen”).  The  direct  cost  of  this  second-stage 
in  the  selection  process  was  about  $10-12,000  per 
student  (Gwen  Sawyer,  June  1990).  Of  those  students 
entering  the  ATCS  Screen,  7,091  successfully  gradu¬ 
ated  and  entered  into  on-the-job  training.  This  two- 
step  selection  process  cost  the  FAA  annually  between 
$20  and  23  million  to  obtain  approximately  1,400 
trainee  or  “developmental”  controllers. 

The  written  aptitude  tests  -  ATCS  Screen  selection 
process  also  imposed  significant  costs  on  applicants. 
Applicants  selected  to  attend  the  ATCS  Screen  had  to 
leave  their  current  jobs  and  funilies  for  9  weeks  with  a  55 
-  60%  chance  of  remaining  in  the  controller  occupation 


at  the  end  of  the  program.  That  risk  may  have  discour¬ 
aged  potentially  qualified  women  and  minority  per¬ 
sons  from  pursuing  an  air  traffic  career  (ASI,  1991). 
The  FAA  undertook  a  major  review  of  its  ATCS 
selection  and  training  programs  in  1990  to  address 
these  agency  and  applicant  costs  and  other  concerns. 
Three  major  ATCS  selection  policy  goals  were  identi¬ 
fied  for  the  project:  (1)  reduce  the  costs  of  ATCS 
selection;  (2)  maintain  the  validity  of  the  ATCS  selec¬ 
tion  system;  and  (3)  support  agency  cultural  diversity 
goals.  The  first  step  toward  achieving  these  goals  was  to 
develop  and  validate  a  test  battery  to  replace  the  9  week 
ATCS  Screen. 

Proposed  test  battery 

Development  of  the  new  test  battery  began  in  late 
1990  by  reviewing  available  information  about  the 
cognitive  requirements  of  the  ATCS  job.  As  described 
in  one  recent  cognitive  task  analysis,  controllers  attend 
to  multiple  information  sources,  assess  and  integrate 
the  data,  develop  and  prioritize  plans  of  action,  and 
implement  those  plans  under  time  pressure  while  main¬ 
taining  situational  awareness  (Human  Technology, 
Inc.,  1991).  To  assess  the  cognitive  and  sensory  at¬ 
tributes  required  to  perform  these  job  functions,  a 
proposed  test  battery  was  developed  by  ASI.  The 
battery  was  developed  within  the  conceptual  framework 
provided  by  Multiple  Resources  Theory  (Rodriquez, 
Narayan,  &  O’Donnell,  1986;  Shingledecker,  1984; 
Wickens,  1984).  Two  computer-administered  infor¬ 
mation  processing  tests  were  designed  to  dynamically 
assess  cognitive  attributes  such  as  spatial  reasoning, 
short-term  memory,  movement  detection,  pattern  rec¬ 
ognition,  and  attention  allocation  (ASI,  1991).  In 
addition,  a  low-fidelity  radar  simulation  of  air  traffic 
control  vectoring  and  separation  tasks  was  also  devel¬ 
oped  as  a  computer-administered  work  sample.  The 
information  processing  tests  and  the  work  sample 
required  performance  of  concurrent,  multiple  tasks  by 
candidates  to  reflect  the  job  demands  placed  on  con¬ 
trollers. 
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The  2  computerized  information  processing  tests 
were  (a)  the  Static  Vector/Continuous  Memory  test 
(SV/CM)  and  (b)  the  Time  Wall/Pattern  Recognition 
test  (TW/PR).  In  the  Static  Vector  (SV)  component  of 
the  first  test,  a  pair  of  simulated  aircraft  were  presented 
on  the  left  half  of  the  computer  screen  (Figure  1).  A 
quasi-data  block  for  each  target  gave  speed  (‘‘S250” 
was  250  knots),  altitude  (“A250”  meant  25,000  feet). 
The  subject’s  task  was  to  determine  as  rapidly  and 
accurately  as  possible  if  the  simulated  aircraft  were  in 
conflict  based  on  their  altitude,  speed,  and  spatial 
relationship.  The  Continuous  Memory  (CM)  compo¬ 
nent  on  the  right  side  of  the  screen  presented  2  aircraft 
call  signs,  one  above  and  the  other  below  a  line.  The 
subject’s  task  was  to  remember  the  bottom  call  sign 
(“Target  call  sign”  in  Figure  1),  for  in  the  next  CM 
trial,  the  subject  had  to  indicate  if  the  call  sign  above 
the  line  (“Probe  call  sign”  in  Figure  1)  was  the  same  as 
had  been  presented  below  the  line  in  the  previous  CM 
trial.  However,  the  subject  had  to  encode  what  was 


now  the  bottom  call  sign  before  responding,  for  as 
soon  as  an  answer  was  made,  a  new  set  of  call  signs 
appeared.  The  attention  director  at  the  bottom  center 
of  the  SV/CM  screen  informed  the  subject  which  task 
(SV  or  CM)  was  to  be  performed  for  each  trial.  A  fixed 
number  of  trials  for  each  component  (SV  and  CM) 
were  administered  in  a  5  minute  SV/CM  session.  The 
speeds,  altitudes,  and  spatial  relationships  between 
aircraft  in  the  SV  and  the  call  signs  in  the  CM  varied 
from  trial  to  trial  within  the  session.  Performance 
feedback  was  provided  at  the  end  of  each  session  nn 
each  component  (SV  and  CM). 

The  TW/PR  test  also  consisted  of  a  set  of  paired 
tasks  (Figure  2).  In  the  Time  Wall  (TW)  component, 
a  square  target  appeared  first,  moving  from  left  to  right 
at  a  steady  speed  toward  the  “wall”  on  the  far  right  of 
the  screen.  After  an  initial  time  interval,  the  moving 
target  and  wall  disappeared  and  were  replaced  by  pairs 
of  patterns.  The  Pattern  Recognition  (PR)  task  was  to 
decide  if  the  patterns  were  identical  while  keeping  in 
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FIGURE  1.  STATIC  VECTOR  (SV)/CONTINUOUS  MEMORY  (CM)  SCREEN.  SV  test  is  shown  on 
the  left-hand  side  of  the  screen,  CM  test  on  the  right.  When  the  attention  director  was  to  the  left, 
the  subject's  task  was  to  decide  if  the  aircraft  targets  would  collide  or  not,  based  on  the  altitude 
("A230")  and  speed  ("5300")  information  in  the  data  blocks  and  spatial  relationships  of  the  targets. 
When  the  attention  director  was  to  the  right,  the  subject's  task  was  to  first,  memorize  the  target 
call  sign  below  the  line,  and  second,  indicate  if  the  probe  call  sign  above  was  the  same,  or 
different,  as  the  target  call  sign  that  had  been  presented  below  the  line  in  the  previous  CM  trial. 
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mind  the  continuing  movement  of  the  TW  target 
toward  the  wall.  The  TW  task  was  to  stop  the  now 
invisible  tuget  as  close  as  possible  to,  without  actually 
hitting  or  Classing  through,  the  wall.  Subjects  were 
presented  Wk:h  a  fixed  number  ofTW/PR  trials  within 
a  nominal  5-minute  test  session;  the  actual  length  of 
the  session  was  a  function  of  subject  response  time.  For 
example,  consistently  stopping  the  moving  target  in 
the  TW  short  of  the  wall  by  a  large  margin  reduced 
total  session  time  proportionately.  Measures  from  both 
the  SV,  CM,  and  PR  components  included  the  mean 
percent  correct  and  mean  reaction  time  for  correct 
responses  across  trials  within  the  5-minute  sessions  for 
each  test  pair;  the  TW  measure  was  the  absolute 
distance  (in  milliseconds)  between  the  wall  and  target 
when  stopped  by  the  subject.  Performance  feedback 
on  these  measures  was  provided  to  the  subjects  at  the 
end  of  each  5  minute  session. 


The  Air  Traffic  Scenario  Test  (ATST;  Figure  3),  the 
computer-administered  work  sample  component  of 
the  proposed  test  battery,  was  developed  by  4  subject 
matter  experts  with  more  than  30  years  of  air  traffic 
control  experience  (ASI,  1991).  The  task  required  the 
subject  to  control  aircraft  within  a  simplified  synthetic 
airspace,  directing  them  to  their  destinations  accord¬ 
ing  to  a  small  set  of  rules.  There  were  6  destinations:  4 
outbound  gates.  A,  B,  C,  and  D;  and  2  airports,  E  and 
F.  The  direction  of  travel,  speed,  and  altitude  of  the 
aircraft,  represented  by  small  arrows  next  to  the  quasi¬ 
data  blocks,  were  controlled  by  mouse.  Three  alphanu¬ 
meric  characters  comprised  the  quasi-data  blocks:  first, 
aircraft  speed  {S.low,  Medium,  second,  altitude 
(2  =  Lowest,  4  =  Highest)-,  and  third,  destination.  The 
orientation  of  the  aircraft  arrow  indicated  its  current 
direction  of  flight.  An  open  circle  in  an  upper  corner 
of  the  data  block  indicated  an  aircraft  waiting  to  be 


PR- 


Movlng  target 
Wall 

Patterns  to 
be  matched 


TW 


Stopped 

target 


FIGURE  2.  TIME  WALL  (TW)/PATTERN  RECOGNITION  (PR)  SCREENS.  First,  the  target  appeared, 
moving  from  left  to  right  at  a  steady  speed  toward  the  "wall"  (Top  screen).  After  an  initial  time 
interval,  the  target  and  wall  were  masked  by  a  pair  of  patterns  (Middle  screen).  The  subject's  task 
was  to  decide  if  the  patterns  were  the  same  or  different.  A  new  pair  of  patterns  appeared  after  each 
response  was  made.  However,  the  subject  had  to  keep  in  mind  the  continuing  movement  of  the 
TW  target  toward  the  wall,  as  the  TW  task  was  to  stop  the  target  (Bottom  screen)  as  close  as 
possible  to,  without  actually  hitting  or  passing  through,  the  wall. 
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activated  by  ('‘handed  ofF  to")  the  subject.  The  large 
arrow  in  the  lower  right  hand  corner  of  the  screen 
indicated  the  landing  direction  at  airports  E  and  F, 
while  the  bottom  horizontal  bar  icon  represented  the 
minimum  lateral  separation  distance.  Aircraft  landed 
at  airports  £  and  F  at  the  lowest  altitude  and  slow  speed 
in  the  required  direction;  aircraft  exited  gates  A,  B,  C, 
and  D  at  the  fastest  speed  and  highest  altitude.  A 
difference  in  altitude  between  any  two  aircraft  was 
considered  adequate  separation;  aircraft  at  the  same 
altitude  had  to  be  separated  by  at  least  5  nautical  miles 
as  represented  by  the  separation  icon.  In  addition,  all 
aircraft  had  to  be  separated  from  the  airspace  boundary 
by  at  least  5  nautical  miles.  Error  counts  were  obtained 
and  summed  to  create  an  overall  error  score.  In  addi- 
tion,  the  system  automatically  computed  the  differ¬ 
ence  between  the  actual  time  to  reach  destination  for 
each  aircraft  and  the  time  required  for  the  optimum 
flight  path  as  determined  by  the  system  software.  This 


en  route  delay  time  was  summed  with  the  time  each 
aircraft  spent  waiting  to  be  activated  as  a  measure  of 
overall  controller  efficiency.  Performance  feedback  on 
these  measures  was  provided  to  subjects  at  the  end  of 
each  of  20  practice  scenarios. 

Study  1: 

Predictive,  Criterion-related  validation 

Two  validation  studies  of  this  proposed  test  battery 
were  conducted  by  the  FAA  in  1991.  The  purpose  of 
the  first  study  was  to  assess  the  predictive,  criterion- 
related  validity  of  the  proposed  test  battery,  and  to 
determine  the  incremental  validity  of  the  proposed 
computerized  tests  over  the  existing  written  test.  The 
sample  in  the  first  predictive,  criterion-related  valida¬ 
tion  study  consisted  of  the  423  newly  hired  air  traffic 
control  students  who  entered  the  ATCS  Screen  in 
March  and  April  1991  in  accordance  with  existing 
FAA  procedures  and  policies.  The  sample  was  pre- 
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FIGURE  3.  AIR  TRAFFIC  SCENARIO  TEST  (ATST)  SCREEN.  The  boundary  encloses  a  simplified 
airspace,  with  4  outbound  gates,  A,  B,  C,  and  D  and  2  airports,  E  and  F.  The  aircraft  and  direction 
of  flight  are  represented  by  the  arrows  adjacent  to  a  data  block.  The  alphanumeric  data  block 
indicates  aircraft  speed  (S,  M,  or  F)  and  altitude  (1  =  lowest,  4  =  highest).  Aircraft  waiting  to  be 
handed  off  are  tagged  with  a  small  open  circle  in  the  upper  right  hand  corner  of  the  data  block. 
Aircraft  are  controlled  with  a  mouse.  First,  the  candidate  clicks  on  an  aircraft,  and  then  clicks  on 
the  appropriate  element  of  either  the  direction  control,  altitude  control,  or  speed  control  icons  to 
change  that  flight  parameter.  Subjects  are  reminded  of  the  required  landing  direction  at  airports 
and  minimum  horizontal  separation  distance  by  the  landing  direction  and  separation  distance 
icons  respectively. 
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dominantly  male  (77. 1  %)  and  non-minority  (83.0%); 
most  (88.7%)  had  entered  federal  service  by  competi¬ 
tive  examination  rather  than  by  non-competitive  spe¬ 
cial  appointment.  There  were  significantly  more  women 
(22.9%)  in  this  validation  sample  compared  to  the 
population  of  students  that  had  entered  the  ATCS 
Screen  between  October  1985  and  January  1991 
(18.9%;  Z=  2.05i/<  .05).  Similarly,  minorities  were 
also  over-represented  in  this  validation  sample  ( 1 7.0%) 
in  comparison  to  the  population  of  ATCS  Screen 
students  (10.2%;  Z  =  2.38,  p  <  .05).  The  majority 
(65%)  had  no  prior  aviation-related  experience  which 
was  representative  of  the  population  that  reached  the 
2nd  stage  of  the  ATCS  selection  system.  Aptitude 
scores  for  the  ATCS  occupation,  represented  in  this 
study  by  the  variable  RATING,  arc  based  on  the  civil 
service  test  scores  earned  by  an  applicant  on  the  written 
aptitude  test  plus  any  statutory  veteran’s  preference 
points.  The  general  development,  psychometric  char¬ 
acteristics,  and  validity  of  the  written  aptitude  test 
battery  has  been  extensively  described  (Sells,  Dailey,  & 
Pickrel,  1984).  RATINGynz.$  used  to  rank-order  com¬ 
petitive  applicants  within  statutory  guidelines  such 
that  hiring  was  done  on  the  basis  of  merit  (Aul,  1991). 

Criterion  for  predictive  validation 

The  criterion  for  this  predictive  study  was  the  final 
composite  score  earned  in  the  ATCS  Screen.  The 
ATCS  Screen  was  originally  established  in  response  to 
recommendations  by  the  U.S.  Congress  House  Com¬ 
mittee  on  Government  Operations  (U.S.  Congress, 
1976)  to  reduce  field  training  attrition  rates.  The 
ATCS  Screen  was  based  upon  a  miniaturized  training- 
testing-evaluation  personnel  selection  model  (Siegel, 
1978,  1983).  Thirteen  performance  assessments,  in¬ 
cluding  classroom  tests,  laboratory  simulations  of 
nonradar  air  traffic  control,  and  a  final  written  exami¬ 
nation,  were  made  during  the  course  of  the  ATCS 
Screen  (Della  Rocco,  Manning,  &  Wing,  1990).  The 
final  summed  composite  score  {SCREEN)  of  these 
ATCS  Screen  performance  measures  was  weighted 
20%  for  classroom  tests,  60%  for  laboratory  scores, 
and  20%  for  the  final  examination,  with  a  minimum 
score  of  70  out  of  100  required  to  pass.  In  this  sample, 
56.0%  passed  the  ATCS  Screen,  27.7%  failed,  and 
16.3%  withdrew  prior  to  completion.  The  mean 


SCREEN  score  of  71.8  (SZ)=  1 1.8)  for  this  validation 
sample  of  423  students  was  not  significantly  different 
from  that  of  the  population  students  that  had  entered 
the  ATCS  Screen  between  October  1985  and  January 
1991. 

Procedure 

The  proposed  test  battery  was  administered  in  2 
waves  to  subjects  the  week  prior  to  beginning  the 
ATCS  Screen.  The  subjects  were  tested  in  March  and 
April  1991  at  the  FAA  Civil  Acromcdical  Institute 
(CAMI)  in  Oklahoma  City.  Instructions  on  the  test 
battery  were  given  on  Monday  morning.  A  total  of  20 
SV/CM  and  20  TW/PR  practice  sessions  were  admin¬ 
istered  to  subjects  across  3.5  days  (Monday  afternoon 
through  Thursday).  The  SV/CM  and  TW/PR  tests  did 
not  change  in  difficulty  across  sessions.  Subjects  also 
were  given  20  practice  scenarios  for  the  ATST,  build¬ 
ing  in  complexity  and  difficulty  from  about  1 2  aircraft 
in  30  minutes  to  over  40  aircraft  in  less  than  30 
minutes  in  the  final  practice  sessions.  Performance 
feedback  was  provided  to  subjects  after  each  practice 
session.  On  Friday,  subjects  received  a  final  4  SV/CM, 
4  TW/PR  sessions,  and  6  ATST  scenarios.  Measures 
were  averaged  across  these  final  graded  sessions  within 
test,  yielding  8  proposed  test  scores:  (1)  SV  average 
percent  correct;  (2)  SV  average  correct  response  reac¬ 
tion  time;  (3)  CM  average  percent  correct;  (4)  CM 
average  correct  response  reaction  time;  (5)  TW  average 
absolute  error;  (6)  PR  average  correct  response  reac¬ 
tion  time;  (7)  average  ATST  error  score;  and  (8) 
summed  delay  and  waiting  times  in  the  ATST  sce¬ 
nario.  Aptitude  ratings  and  ATCS  Screen  scores  were 
extracted  for  the  423  subjects  from  the  CAMI  research 
data  bases  after  all  subjects  had  completed  the  ATCS 
Screen.  These  data  were  matched  with  proposed  test 
scores  for  analysis;  proposed  test  scores  were  not  used 
in  any  way  to  make  employment  decisions  about  the 
subjects. 

Results 

On  one  hand,  performance  on  the  SV/CM  and 
TW/PR  tests  appeared  to  reach  differential  stability 
(Bittner,  1979)  at  about  the  1 5th  session.  The  average 
performance  within  test  component  across  the  final  ses¬ 
sions  represented  a  reasonable  measure  of  asymptotic 
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individual  difFerences  on  those  tests.  On  the  other 
hand,  learning  curve  analyses  were  not  possible  with 
the  ATST  because  scenario  difTiculty  increased  across 
sessions.  However,  average  performance  across  the 
final  6  scenarios  was  still  computed  as  the  index  of 
individual  difFerences  on  that  test  component.  Mul¬ 
tiple  regression  analysis  was  used  to  assess  how  well  the 
proposed  test  battery  predicted  student  performance 
in  the  ATCS  Screen  after  taking  into  account  student 
aptitude.  First,  RATING y/ns  entered  into  the  regres¬ 
sion  equation  predicting  SCREEN.  There  was  a  statis¬ 
tically  significant  linear  relationship  between  RA  TING 
and  SCREEN  of  R  *  .23,  p  .001,  where  R  was  the 
multiple  correlation  between  predictor  {RA  TING)  and 
criterion  (SCREEN)  and  p  <  .001  indicated  that  an  R 
of  this  magnitude  would  be  expected  by  chance  alone 
in  less  than  1  in  a  thousand  times.  The  relationships  of 
proposed  test  battery  average  final  scores  to  SCREEN 
were  analyzed  in  the  second  step  of  the  multiple 
regression  analysis  using  a  forward  stepwise  procedure 
to  determine  the  optimal  combination  of  predictor 
variables.  In  a  forward  stepwise  multiple  regression 
analysis,  the  proposed  test  score  accounting  for  the 
most  variability  left  in  the  criterion  SCREEN entcKd 
the  regression  equation;  then,  one  at  a  time,  proposed 
test  scores  which  accounted  for  the  most  of  the  remain¬ 
ing  unexplained  variability  in  SCREENv/cn  added  to 
the  equation,  until  the  amount  of  variability  explained 
by  a  new  score  became  insignificant.  The  optimal 
linear  combination  of  proposed  test  scores  accounted 
for  an  additional  20%  {IP  =  .20,  p  <  .001)  of  the 
variability  in  SCREEN over  the  proportion  of  variabil¬ 
ity  already  explained  by  student  aptitude  scores  {RA  T~ 
ING).  There  were  no  statistical  difFerences  in  the 
prediction  equation  by  sex  and  minority  status  (ASI, 
1 991),  suggesting  that  the  proposed  test  battery  might 
not  adversely  impact  protected  classes  of  applicants. 

Study  2: 

Concurrent,  criterion-related  validation 

Encouraged  by  the  results  of  the  initial  predictive 
study,  the  FAA  conducted  a  concurrent,  cr'terion- 
related  validation  study  to  assess  the  validity  of  the 
proposed  test  battery  as  a  replacement  for  the  ATCS 
Nonradar  Screen  (Weltin,  Broach,  Goldbach,  & 
O’Donnell,  1991).  The  sample  for  this  second  valida¬ 


tion  study  was  composed  of  297  trainee  (“develop¬ 
mental”)  and  FPL  controllers.  While  this  sample  was 
predominantly  male  (64.6%)  and  non-minority 
(61.6%),  women  and  minorities  were  over  sampled 
relative  to  their  representation  in  the  ATCS  workforce. 
The  majority  of  the  sample  was  drawn  from  en  route 
centers  (58.2%),  reflecting  the  historical  employment 
patterns  in  the  workforce;  49.2%  had  attained  FPL 
certification.  The  final  composite  SCREEN  score  for 
each  participant  was  extracted  from  the  CAMl  ATCS 
Selection  data  base  and  used  as  the  current  predictor  in 
this  study.  The  SV/CM,  TW/PR,  and  ATST  average 
test  scores  described  in  the  first  study  were  the  alterna¬ 
tive  predictors  in  this  validity  study.  The  ATCS  Pre- 
Training  Screen  (ATCS/PTS),  as  the  proposed  battery 
had  come  to  be  known,  was  administered  to  subjects 
during  late  summer  1991  using  the  same  test  adminis¬ 
tration  protocols  as  in  the  first  study. 

Criterion  for  concurrent  validation 

This  study  was  constrained  to  use  available  training 
performance  indices  as  validation  criteria;  no  other 
criteria  were  developed  or  collected.  These  indices 
included  the  number  of  days  spent  in  particular  phases 
of  field  training  and  hours  of  formal,  documented  on- 
the-job  training  (OJT)  provided  under  the  supervision 
of  a  designated  OJT  Instructor  within  those  phases,  as 
reported  by  field  ATC  facilities  in  accordance  with 
national  policy  (FAA,  1985).  Subjective  ratings  of 
developmental  performance  in  that  phase  of  training 
(1  =  Bottom  10%  compared  to  all  other  controllers 
observed  in  training,  6  =  Top  10%  compared  to  all  other 
controllers  observed  in  training  were  also  available  for 
each  participant  in  this  second  validation  study.  Data 
for  the  ground,  local,  and  radar  control  phases  of 
instruction  were  extracted  from  the  CAMI  ATCS 
Training  Tracking  data  base  for  subjects  drawn  from 
FAA  terminal  facilities.  The  ground  control  phase 
qualified  a  developmental  to  control  the  movement  of 
departing  and  arriving  aircraft  on  the  airport  surface, 
including  ramps  and  taxiways.  Local  control  devel¬ 
oped  the  skills  to  control  arriving  and  departing  air¬ 
craft  on  the  active  runways  and  in  the  immediate  visual 
airspace  of  the  terminal.  Radar  control  taught  tech¬ 
niques  and  procedures  for  the  control  of  aircraft  arriv¬ 
ing  in  and  departing  from  the  terminal’s  extended 
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aitspacc  for  facilities  equipped  with  radar.  Data  on  the 
initial  radar  associate  and  initial  radar  qualification 
phases  of  training  were  collected  for  en  route  subjects. 
The  en  route  radar  associate  phase  qualified  the  devel¬ 
opmental  controller  to  initiate  and  accept  radar  hand- 
offs  and  point-outs,  perform  flight  data  entries, 
maintain  flight  progress  strips,  and  communicate  with 
aircraft  and  other  facilities  by  interphone  and  radio  a$ 
directed  by  the  radar  controller  on  a  position.  In  con¬ 
trast,  the  goal  of  the  radar  qualification  phase  of 
instruction  was  to  qualify  the  developmental  as  the 
radar  controller  on  two  positions  or  sectors  within  the 
assigned  area  of  specialization.  The  radar  controller 
has  overall  responsibility  for  the  safe,  orderly,  and 
expeditious  movement  of  air  traffic  within  the  as¬ 
signed  sector  of  airspace.  Performance  assessments 
from  additional  radar  training  conducted  by  the  FAA 
Academy  were  also  extracted  from  the  research  data 
bases  where  available  for  subjects.  FAA  Academy  radar 
training  provided  instruction  in  critical  radar  tech¬ 
niques  and  procedures  in  the  safety  of  a  simulated 
airspace. 

An  overall  standardized  composite  score  for  each  of 
297  participants  in  this  validation  study  was  created 
from  these  time-to-complete,  performance  assessment 
measures,  and  FAA  Academy  radar  training.  This 
training  performance  ( TRNGPERF)  composite  crite¬ 
rion  represented  the  rate  and  quality  of  progress  in 
training  for  an  individual  relative  to  peers  assigned  to 
the  same  type  and  level  of  facility  that  had  completed 
the  same  curriculum.  The  mean  TRNGPERF  scon 
was  0.44  (SD  *  .30),  with  a  range  of  0  to  1 .  A  criterion 
score  of  0  indicated  consistently  poorer  (longer  than 
average  times  to  complete  and  lower  assessments  of 
quality).  A  score  of  1  reflected  consistently  higher 
performance  than  peers  (shorter  than  average  times 
and  higher  assessments);  an  intermediate  score  of  .50 
indicated  consistently  average  performance  relative  to 
peers  assigned  to  the  same  type  and  level  of  facility. 

Results 

Correlations  were  computed  between  the  current 
predictor  SCREEN,  alternative  ATCS/PTS  predic¬ 
tors,  and  the  criterion.  The  correlation  matrix  was 
corrected  for  explicit  and  incidental  restriction  in 
range  due  to  prior  selection  of  the  sample  on  the 


current  SCREEN piedictot  (sec  Ghisclli,  Campbell,  & 
Zedeck,  1981)  and  submitted  for  regression  analysis. 
The  corrected  multiple  correlation  between  the  ATCS/ 
PTS  average  final  scores  and  TRNGPERF v/ss  /?«  .25 
(uncorrected  /?  »  .21,  ^  .05)  compared  to  /?  =  .19 
(uncorrcctcd  /?  *  .11,^  .05)  for  the  current  SCREEN 
predictor.  While  modest,  the  validity  coefficient  of  .25 
for  the  ATCS/PTS  indicated  that  a  prediction  about 
probable  performance  in  field  training  for  an  indi¬ 
vidual  could  be  made  from  knowledge  of  his  or  her 
scores  on  the  computerized  test  battery.  Moreover,  the 
validity  of  the  proposed  5-day  test  battery  was  at  least 
equal  to  that  of  the  existing  9-week  training-as-scrcen. 
Subsequent  analyses  again  suggested  that  the  validities 
of  the  ATCS/PTS  and  ATCS  Screen  did  not  vary  as  a 
function  of  sex  or  minority  group  status  (Wcitin,  etaU 
1992). 

Study  3: 

Comparison  of  ATCS/PTS  to  job 
attribute  requirements 

A  third  analysis  (Broach  &  Aul,  1 993)  of  the  ATCS/ 
PTS  was  undertaken  after  it  was  validated  in  order  to 
independently  compare  test  constructs  with  job  cogni¬ 
tive  attribute  requirements.  During  the  data  collection 
phase  of  the  second  study,  FAA  psychologists  and 
technicians  interviewed  52  of  the  incumbent  FPL 
controllers  from  all  types  and  levels  of  air  traffic  con¬ 
trol  facilities.  Example  facility  types  included  Air  Route 
TraP*  '^ontrol  Centers  (ARTCC),  also  known  as  En 
Route  centers.  Terminal  Radar  Approach  Control 
(TRACON)  terminals  with  high  traffic  densities.  Level 
3  radar  terminals  (L3R)  with  intermediate  traffic  den¬ 
sities,  and  Level  1  and  2  Nonradar  (e.g.,  VFR  Non¬ 
approach)  towers  (L12NR)  with  lower  traffic  counts. 
The  job  analysts  then  completed  a  Position  Analysis 
Questionnaire  (PAQ;  McCormick,  Mecham,  & 
Jeanerett,  1977)  for  each  interview.  Estimated  require¬ 
ments  for  worker  attributes  of  an  ability  or  aptitude 
nature  were  computed  from  the  52  sets  of  job  ratings 
by  PAQ  Services,  Incorporated,  based  on  their  data 
base  on  over  2,000  jobs  in  the  U.S.  economy.  Prelimi¬ 
nary  data  from  this  third  analysis  in  the  form  of 
estimated  percentiles  for  cognitive  and  general  intelli¬ 
gence  attributes  are  illustrated  in  Figure  4  for  two 
selected  air  traffic  control  facility  types  and  levels.  The 
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FIGURE  4.  ATCS  JOB  ABILITIES  PROFILE.  Ability  attributes  are  listed  a/ong  the  vertical  axis.  The 
0  -  WO  scores  along  the  horizontal  axis  indicate  the  estimated  proportion  of  jobs  in  the  PAQ 
Services,  Inc.  data  base  for  which  an  attribute  received  the  same  or  lower  relevance  scores  than 
ATCS  jobs.  Ability  requirements  for  the  ATCS  job  in  Level  1  &  2  nonradar/nonapproach  (VFR) 
towers  are  contrasted  with  the  profile  for  en  route  Air  Route  Traffic  Control  Centers. 


percentile  estimates  the  proportion  of  the  jobs  in  the 
PAQ  data  base  for  which  an  attribute  received  the  same 
or  lower  relevance  scores  as  the  job  being  analyzed 
(Mecham  &  McCormick,  1969;  Mecham,  McCormick, 
&  Jeanneret,  1977;  McCormick,  Jeannerett,  & 
Mecham,  1972). 

These  analyses  by  Broach  and  Aul  (1993)  suggested 
that  perceptual  speed,  closure,  simple  reaction  time, 
and  short-term  memory  were  more  relevant  to  the 
controller  job  than  to  many  other  jobs  in  the  U.S. 
economy.  Numerical  computation,  arithmetic  reason¬ 
ing,  convergent  and  divergent  thinking  also  appeared 
to  be  more  relevant  to  performance  in  the  ATCS 
occupation.  But  contrary  to  expectation,  time  sharing, 
selective  attention,  spatial  visualization,  and  spatial 
orientation  were  not  more  relevant  to  air  traffic  cotnrol 
than  to  other  U.S.  occupations.  In  other  words,  there 
appears  to  be  a  substantial  proportion  of  jobs  in  the 


U.S.  economy  to  which  spatial  and  attention  alloca¬ 
tion  abilities  are  more  relevant  than  to  the  controller 
occupation.  Finally,  with  the  exception  of  spatial  abili¬ 
ties  as  illustrated  in  Figure  4,  the  cognitive  abilities 
requirements  for  controllers  appeared  to  be  reasonably 
homogenous  across  facility  types  and  levels.  The  re¬ 
quirement  for  spatial  visualization  appeared  to  be 
more  relevant  to  terminal  facilities  than  to  en  route 
facilities. 

Overall,  tests  that  represented  perceptual  speed, 
closure,  reaction  time,  memory,  arithmetic  reasoning, 
and  some  degree  of  spatial  ability  would  be  expected  to 
predict  performance  in  both  en  route  and  terminal 
environments.  In  order  to  evaluate  the  correspondence 
between  test  and  job  requirements,  PAQ  ratings  of  the 
proposed  test  battery  were  completed  by  a  single, 
highly  experienced  PAQ  consultant  from  Jeancrett 
and  Associates.  The  resulting  cognitive  attribute  re- 


8 


ValiJatitM  of  the  FAAATCS  Pre-Training  Screen 


Ptroptml  ipMd 
Closur* 
SimpI*  rMction  tim* 
Long^wm  mamoiy 
Short-twm  mamory 
Tima  sharing 
Salactiva  attsntion 
Spatial  visualization 
Spatial  orisntation 
Numsricai  computation 
Arithmstic  raasoning 
Convargant  thinking 
Oh/argant  thinking 
Intalligenca 
Idaational  fluency 
Originality 
Preblam  sensitivity 


FIGURE  5.  ATCS/PTS  ABILITIES  PROFILE.  Ability  attributes  are  listed  along  the  vertical  axis.  The 
0-100  scores  along  the  horizontal  axis  indicate  the  estimated  proportion  of  jobs  in  the  PAQ 
Services,  Inc.  data  base  for  which  an  attribute  received  the  same  or  lower  relevance  scores.  Ability 
requirements  for  the  SV/CM,  TW/PR  and  ATST  are  illustrated. 


quirements  for  the  test  battery  are  illustrated  in  Figure  DISCUSSION 

5  for  this  preliminary  study  of  the  correspondance 

between  test  and  job.  While  no  formal  statistical  analy-  Two  formal  validation  studies  on  a  total  of  720 

ses  have  been  conducted  as  yet,  there  appeared  to  be  subjects  demonstrated  that  the  ATCS/PTS  was  a  vi- 

some  degree  of  similarity  between  the  test  and  job  able  replacement  for  the  ATCS  Screen  as  the  2nd 

profiles  in  kind,  if  not  degree.  For  example,  the  re-  hurdle  in  the  FAA’s  ATCS  selection  system.  The  first 

quirement  for  perceptual  speed  and  simple  reaction  study  demonstrated  that  the  computer-administered 

time  were  similar  between  the  ATCS  job  and  the  TW/  test  battery  explained  some  of  the  variability  in  scores 

PR  and  ATST  tests.  While  the  attribute  percentile  earned  in  the  ATCS  Screen,  even  after  taking  into 

scores  for  the  ATCS/PTS  were  generally  lower,  the  account  student  aptitude.  The  second  study  found 

shape  of  the  profile  across  basic  mental  abilities  such  as  that  ATCS/PTS  was  about  as  valid  as  the  ATCS  Screen 

memory  and  attention  and  higher-order  skills  such  as  in  predicting  relative  performance  in  ATCS  field  tech- 

numerical  computation  and  divergent  thinking  was  nical  training.  The  new  test  battery  was  objectively 

broadly  similar  to  that  of  the  job.  Overall,  these  early  administered  and  scored,  and  the  validity  of  the  new 

data  suggested  at  least  some  degree  of  correspondence  test  battery  did  not  appear  to  vary  as  a  function  of  sex 
between  proposed  test  battery  and  job  attribute  re-  and  minority  status.  Finally,  the  ATCS/PTS  achieved 
quirements;  further  analyses,  using  multiple  raters  to  the  major  policy  goal  of  reducing  the  cost  of  selection  at 

evaluate  the  test  battery,  will  provide  a  basis  for  a  more  the  2nd  hurdle  in  the  ATCS  selection  process  from  about 

definitive  assessment.  $10,000  to  about  $2,000  per  candidate.  Therefore,  the 
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FAA  Academy  ATCS  Nonradar  Screen  was  terminated 
in  March  1992  and  the  ATCS/PTS  became  opera¬ 
tional  as  the  FAA’s  2nd  stage  selection  test  in  June 
1 992  on  the  basis  of  the  results  of  the  second,  concur¬ 
rent  validation  study.  The  ATCS  selection  sys  ’em  now 
consists  of  the  4-hour  written  ATCS  aptitude  test 
battery  followed  by,  for  those  applicants  earning  a 
qualifying  score,  second-level  screening  on  the  ATCS/ 
PTS.  The  final  ATCS/PTS  protocol  provides  20  SV/ 
CM,  20  TW/PR,  and  20  ATST  practice  sessions  over 
2.3  days  (Monday  afternoon  through  Wednesday), 
followed  by  the  final  4  SV/CM,  4  TW/PR,  and  6 
ATST  “for  grade”  testing  sessions  on  Thursday.  Can¬ 
didates  are  informed  of  the  outcome  of  screening  on 
Friday.  Those  that  successfully  complete  the  ATCS/ 
PTS  are  then  eligible  for  hiring  by  the  FAA  and 
subsequent  enrollment  in  the  FAA  Academy  ATCS 
training  programs.  In  this  new  system,  all  selection  is 
accomplished  prior  to  the  actual  hiring  and  subse¬ 
quent  training  of  entry-level  controllers. 

The  ATCS/PTS  represents  a  major  policy  and  re¬ 
search  initiative  for  the  FAA.  As  noted  by  Ackerman 
(1991),  ATCS  selection  research  represents  a  praxis  of 
public  policy,  psychological  theory,  and  psychometric 
practice.  Continuing  research  is  required  to  assess  the 
longitudinal  fairness  of  the  new  battery  in  order  to 
satisfy  legal  and  human  resource  policy  requirements. 
An  additional  research  requirement  is  to  develop  and 
validate  an  expanded  test  battery.  Only  cognitive  abili¬ 
ties  are  assessed  by  the  current  version  of  the  ATCS/ 
PTS.  But  non-cognitive  factors  such  as  biographical 
data  have  been  shown  to  be  useful  predictors  of  near- 
term  criteria  such  as  the  ATCS  Screen  (Collins,  Nye, 
&  Manning,  1 992)  and  criteria  such  as  performance  in 
radar-based  training  1  to  2  years  after  entry  into  the 
occupation  (Broach,  1992).  Personality  has  similarly 
shown  promise  in  several  studies  as  a  predictor  of  near- 
term  performance  (Schroeder,  Broach,  &  Young,  1992; 
Nye  &  Collins,  1991).  Development  of  a  expanded 
test  battery  might  enable  the  agency  to  implement  a 
single-hurdle  selection  system,  further  reducing  the 
financial  costs  of  ATCS  selection.  A  third  important 
research  requirement  is  the  development  of  appropri¬ 
ate  measures  of  ATCS  job  performance.  What  is  vali¬ 
dated  in  personnel  selection  research  is  the  hypothesis 
that  job  performance,  or  important  aspects  of  job 


performance,  can  be  inferred  from  test  scores  (Guion, 
1992).  For  example,  given  the  nature  of  the  criterion 
in  the  concurrent  validation  study,  the  only  fully 
justified  inference  that  currently  can  be  drawn  from 
ATCS/PTS  scores  is  how  rapidly  a  person  might  be 
expected  to  complete  field  ATCS  training  relative  to 
other  developmentals  (slower  or  faster  than  average, 
overall).  Inferences  about  probable  technical  job  per¬ 
formance,  such  as  efficiency  in  separating  aircraft  and 
orderliness  of  the  flow  of  aircraft,  will  require  develop¬ 
ment  of  different  criterion  measures.  Similarly,  infer¬ 
ences  about  attrition  from  the  ATCS  occupation  from 
ATCS/PTS  scores  will  have  to  await  results  of  longitu¬ 
dinal  evaluations  of  the  Study  1  students  and  Study  2 
developmentals  as  they  progress  through  the  field 
training  program.  Other  important  ATCS  selection 
issues  include  differential  assignment  to  facility  types 
and  levels  based  on  test  score  profiles  and  assessment  of 
selection  system  utility.  Finally,  as  the  controller  occu¬ 
pation  changes,  the  ATCS  selection  process  must  also 
change.  The  emerging  Advanced  Automation  System 
may  (or  may  not)  have  profound  implications  for 
ATCS  selection  (see  Manning  &  Broach,  1992  for  an 
early  exploratory  study).  Systematic  and  continuous 
selection-oriented  research  is  strongly  recommended 
as  an  integral  part  of  ATC  systems  design  specifically 
and  the  national  aviation  human  factors  research  plan 
generally. 
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